Run your own LLM

NexaAI: Hub for On-Device AI

Featuring a Local File Organizer that can automatically rename and organize local files based on their content.

NexaSDK: comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.”

Oolama

Eric Hartford describes step-by-step how to run locally an Oolama-based set of LLM models for conversation and chat. Sizes start at 3 GB with RAM at 6 GB+.

Import your own OOlama models from GGUF, PyTorch or Safetensors and add to the Oolama repository with these instructions.

LM Studio

Run a model locally with LM Studio Sample session with Phi. Note the resources consumed are listed at the bottom With LM Studio, you can …

🤖 - Run LLMs on your laptop, entirely offline 👾 - Use models through the in-app Chat UI or an OpenAI compatible local serve r📂 - Download any compatible model files from HuggingFace 🤗 repositories

But I find it’s pretty slow…on my 8GB M2 Mac.

GPT4All

Open source models from Nomic let you run Large Language Models Locally

I tried running this on my 8MB M2 Mac with a very small model from Phi-2-GGUF). It runs quickly and without overly burdening the rest of my system.

Here’s an example of it writing code:

GPT4All running the very small Phi-2-GGUF model